MGE Database Development
MGE Database Development
Zhang Yangsen Professor of Beijing Information Science & Technology University
Li Yongwang Research fellow on the Institute of Coal Chemistry, Chinese Academy of Sciences
Wen Xiaodong Research fellow of the Institute of Coal Chemistry, Chinese Academy of Sciences, and expert on the Thousand Talent Program for Young Outstanding Scientists
Wang Xingfen Professor of Beijing Information Science & Technology University
【Chief members】
Zhang Yangsen Professor of Beijing Information Science & Technology University
Li Yongwang Research fellow on the Institute of Coal Chemistry, Chinese Academy of Sciences
Wen Xiaodong Research fellow of the Institute of Coal Chemistry, Chinese Academy of Sciences
Wang Xingfen Professor of Beijing Information Science & Technology University
【Research Background】
There are three core elements of materials genome technology namely, high-throughput material computing methods, high-throughput material experiment methods, and material database construction and analysis methods. A database dedicated to Materials Genome Engineering (MGE) is a series of basic spatial and temporal data regarding some parameters of interest to researchers during material R&D. This is the basis of material design and high-throughput computational simulation and also serves as support for high-throughput experimental design of materials. Meanwhile, the experimental data obtained in the high-throughput computational simulation and material preparation process, and the data obtained through data correlation analysis and computing, can enrich the content of the material database. An MGE-dedicated database will provide data support for powerful computational analysis and theoretical simulation, and reduce the reliance on physical experiments for R&D and for the production of new materials. A dedicated database system will provide support for design, computational simulation, and experimental verification, advance the design and development of materials to a comprehensive design and development level based on computing and information technology, and greatly accelerate the process of material R&D.
【Research Objectives】
1. Establish data collection algorithms and material data attribute annotation systems based on multi-source data such as composition, structure, process, property, and service behavior.
2. Design an MGE-dedicated database structure for high-throughput computational simulation and high-throughput material preparation experiments.
3. Establish statistical analysis models and algorithms for statistical data and establish association analysis models and algorithms for association analysis data.
4. Establish a data cluster for big data analysis to provide application services for materials research.
【Main Research Areas】
1. Acquisition and pre-processing methods for multi-source data, standard technical specifications for the fusion, management, and sharing of heterogeneous material data, and construction of annotation systems for material attribute data.
2. Design of a database architecture dedicated to multi-level, cross-scale material design, to high-throughput experimental verification, and to design of a suitable database storage structure.
3. Application of machine learning and big data analysis technology for association analysis between multi-scale, computational, and experimental material data, to high-precision image processing of material structures, and to unstructured data mining.
4. Construction of a knowledge graph system and knowledge base system for material analysis and computing, establishment of an inference expert system for the R&D of new materials, and establishment of a trend prediction system for new materials R&D.
5. Building of a Map-Reduce-based, big data computing platform for the design and preparation of new materials.
【Important Research Progress】
1. The team used the feature engineering concept in machine learning to extract effective descriptors of catalytic material systems from a large amount of computational data. The research approach presented in Figure 1 allowed correlation between catalyst characteristics to be investigated in a systematic and convenient manner. This was herein applied to the process of catalytic hydrogen production through methanol steam reforming—an important hydrogen energy preparation route. Based on a microkinetic method, the rate of the reaction was obtained, which in turn led to discovery of the association between activity and feature variables. Finally, the team obtained a series of successful descriptors to guide the rational design of the catalyst. The method proposed in this work to obtain descriptors in a visualized and systematic manner can be extended to other catalytic systems.
Figure 1 “Feature Engineering”-based research ideas and standards applied to catalysis research
2. Beijing Materials Genome Engineering High-tech Innovation Center (Beijing Information Science & Technology University Sub-center) and SYNFUELS CHINA established an industry-university-research collaboration base. In the base, a rational design mode for catalytic materials was adopted “mainly based on artificial intelligence computing and big data supplemented with experimental verification.” Now, a high-throughput computational prediction platform is being developed using intelligent search and prediction software for catalytic materials, a gas-phase reaction thermodynamics and dynamics database, a database of catalytic materials and their physical properties, and data mining software. There is also an integrated catalytic material discovery platform based thereon. The base is using the platform to compute, screen, and develop catalytic energy materials and new energy materials. The aim is to accelerate the discovery and development of catalytic energy materials and new energy material systems, and also to establish a data expert system for energy-related catalytic processes. A database of carbon-containing substances was constructed and currently contains more than 20 million data entries, along with the database schema and data mining software schema shown in Figure 2.
Figure 2 Frameworks for material database construction and database mining